Protein Remote Homology Detection by Combining Profile-based Protein Representation with Local Alignment Kernel
نویسندگان
چکیده
Protein remote homology detection has attracted a great deal of interest as it is one of the most important problems in bioinformatics. Profile-based methods recently achieve the state-of-the-art performance. A key step to improve the performance of these methods is to find a suitable approach to use the evolutionary information in the profiles. In this study, we propose the profile-based protein representation to extract the evolutionary information from frequency profiles. In this approach, the frequency profiles calculated from the multiple sequence alignments outputted by PSI-BLAST are converted into several profile-based proteins and then the local alignment kernel (LA) is combined with these profile-based proteins for the prediction. Our experiments on a well-known benchmark show that the proposed approach can significantly improve the predictive performance.
منابع مشابه
Profile-based direct kernels for remote homology detection and fold recognition
MOTIVATION Protein remote homology detection is a central problem in computational biology. Supervised learning algorithms based on support vector machines are currently one of the most effective methods for remote homology detection. The performance of these methods depends on how the protein sequences are modeled and on the method used to compute the kernel function between them. RESULTS We...
متن کاملIncorporating homologues into Sequence Embeddings for protein Analysis
Statistical and learning techniques are becoming increasingly popular for different tasks in bioinformatics. Many of the most powerful statistical and learning techniques are applicable to points in a Euclidean space but not directly applicable to discrete sequences such as protein sequences. One way to apply these techniques to protein sequences is to embed the sequences into a Euclidean space...
متن کاملStructure-based Kernel for Remote Homology Detection
Remote homology detection is a central problem in computational biology. Currently, the most effective tools for addressing this problem are kernel-based discriminative methods employing support vector machines. These methods work by transforming the protein sequences into (a possibly high-dimensional) vector space, called feature space, and deriving a kernel function in the feature space, whic...
متن کاملProbabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection
MOTIVATION The problems of protein fold recognition and remote homology detection have recently attracted a great deal of interest as they represent challenging multi-feature multi-class problems for which modern pattern recognition methods achieve only modest levels of performance. As with many pattern recognition problems, there are multiple feature spaces or groups of attributes available, s...
متن کاملLearned Random-Walk Kernels and Empirical-Map Kernels for Protein Sequence Classification
Biological sequence classification (such as protein remote homology detection) solely based on sequence data is an important problem in computational biology, especially in the current genomics era, when large amount of sequence data are becoming available. Support vector machines (SVMs) based on mismatch string kernels were previously applied to solve this problem, achieving reasonable success...
متن کامل